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Abstract 

Rating-based collaborative filtering is the process of predict- 
ing how a user would rate a given item from other user 
ratings. We propose three related slope one schemes with 
predictors of the form f{x) ^ x + b, which precompute the 
average difference between the ratings of one item and an- 
other for users who rated both. Slope one algorithms are 
easy to implement, efficient to query, reasonably accurate, 
and they support both online queries and dynamic updates, 
which makes them good candidates for real-world systems. 
The basic slope one scheme is suggested as a new ref- 
erence scheme for collaborative filtering. By factoring in 
items that a user liked separately from items that a user dis- 
liked, we achieve results competitive with slower memory- 
based schemes over the standard benchmark EachMovie and 
Movielens data sets while better fulfilling the desiderata of 
CF applications. 

Keywords: Collaborative Filtering, Recommender, e- 
Commerce, Data Mining, Knowledge Discovery 

1 Introduction 

An online rating-based Collaborative Filtering CF query 
consists of an array of (item, rating) pairs from a single user. 
The response to that query is an array of predicted (item, 
rating) pairs for those items the user has not yet rated. We 
aim to provide robust CF schemes that are: 

1. easy to implement and maintain: all aggregated data 
should be easily interpreted by the average engineer and 
algorithms should be easy to implement and test; 

2. updateable on the fly: the addition of a new rating 
should change all predictions instantaneously; 

3. efficient at query time: queries should be fast, possibly 
at the expense of storage; 

4. expect little from first visitors: a user with few ratings 
should receive valid recommendations; 

5. accurate within reason: the schemes should be compet- 
itive with the most accurate schemes, but a minor gain 
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Figure 1: Basis of SLOPE One schemes: User A's ratings 
of two items and User B's rating of a common item is used 
to predict User B's unknown rating. 



in accuracy is not always worth a major sacrifice in sim- 
plicity or scalability. 

Our goal in this paper is not to compare the accuracy 
of a wide range of CF algorithms but rather to demonstrate 
that the Slope One schemes simultaneously fulfill all five 
goals. In spite of the fact that our schemes are simple, 
updateable, computationally efficient, and scalable, they are 
comparable in accuracy to schemes that forego some of the 
other advantages. 

Our Slope One algorithms work on the intuitive prin- 
ciple of a "popularity differential" between items for users. 
In a pairwise fashion, we determine how much better one 
item is liked than another One way to measure this differen- 
tial is simply to subtract the average rating of the two items. 
In turn, this difference can be used to predict another user's 
rating of one of those items, given their rating of the other. 
Consider two users A and B, two items / and J and Fig. [T] 
User A gave item / a rating of 1, whereas user B gave it a 
rating of 2, while user A gave item J a rating of 1.5. We ob- 
serve that item J is rated more than item / by 1.5 — 1 = 0.5 
points, thus we could predict that user B will give item J a 
rating of 2 -t- 0.5 = 2.5. We call user B the predictee user and 
item / the predictee item. Many such differentials exist in a 
training set for each unknown rating and we take an average 
of these differentials. The family of slope one schemes pre- 
sented here arise from the three ways we select the relevant 
differentials to arrive at a single prediction. 

The main contribution of this paper is to present slope 
one CF predictors and demonstrate that they are competitive 



with memory-based schemes having almost identical accu- 
racy, while being more amenable to the CF task. 

2 Related Work 

2.1 Memory-Based and Model-Based Schemes 

Memory-based collaborative filtering uses a similarity 
measure between pairs of users to build a prediction, 
typically through a weighted average (S] [12] [13] [TSl . The 
chosen similarity measure determines the accuracy of the 
prediction and numerous alternatives have been studied |8|. 
Some potential drawbacks of memory-based CF include 
scalability and sensitivity to data sparseness. In general, 
schemes that rely on similarities across users cannot be 
precomputed for fast online queries. Another critical issue 
is that memory-based schemes must compute a similarity 
measure between users and often this requires that some 
minimum number of users (say, at least 100 users) have 
entered some minimum number of ratings (say, at least 
20 ratings) including the current user We will contrast 
our scheme with a well-known memory-based scheme, the 
Pearson scheme. 

There are many model-based approaches to CF. Some 
are based on linear algebra (SVD, PCA, or Eigenvectors) ||3] 
[6] [7] \1Q\ [15] [TSl ; or on techniques borrowed more directly 
from Artificial Intelligence such as Bayes methods. Latent 
Classes, and Neural Networks CI 121 13; or on clustering ||4] 
|5l. In comparison to memory -based schemes, model-based 
CF algorithms are typically faster at query time though they 
might have expensive learning or updating phases. Model- 
based schemes can be preferable to memory-based schemes 
when query speed is crucial. 

We can compare our predictors with certain types of pre- 
dictors described in the literature in the following algebraic 
terms. Our predictors are of the form f{x) = x + b, hence 
the name "slope one", where b is a constant and x is a vari- 
able representing rating values. For any pair of items, we 
attempt to find the best function / that predicts one item's 
ratings from the other item's ratings. This function could be 
different for each pair of items. A CF scheme will weight 
the many predictions generated by the predictors. In | fT4) , 
the authors considered the correlation across pairs of items 
and then derived weighted averages of the user's ratings as 
predictors. In the simple version of their algorithm, their pre- 
dictors were of the form f{x) — x. In the regression-based 
version of their algorithm, their predictors were of the form 
/(x) = ax + b. ln\n\, the authors also employ predictors of 
the form f{x) = ax + b. A natural extension of the work in 
these two papers would be to consider predictors of the form 
f{x) = ax^ + bx + c. Instead, in this paper, we use naive pre- 
dictors of the form /(x) = x + b. We also use naive weight- 
ing. It was observed in fT4l that even their regression-based 
/(x) = ax + b algorithm didn't lead to large improvements 
over memory-based algorithms. It is therefore a significant 



result to demonstrate that a predictor of the form /(x) — x+b 
can be competitive with memory-based schemes. 

3 CF Algorithms 

We propose three new CF schemes, and contrast our pro- 
posed schemes with four reference schemes: Per User 
Average, Bias From Mean, Adjusted Cosine Item- 
Based, which is a model-based scheme, and the Pearson 
scheme, which is representative of memory-based schemes. 

3.1 Notation We use the following notation in describing 
schemes. The ratings from a given user, called an evaluation, 
is represented as an incomplete array m, where m,- is the rating 
of this user gives to item /. The subset of the set of items 
consisting of all those items which are rated in u is S{u). The 
set of all evaluations in the training set is %. The number 
of elements in a set S is card{S). The average of ratings in 
an evaluation u is denoted u. The set is the set of all 
evaluations m G X such that they contain item / (/ e S{u)). 
Given two evaluations m,v, we define the scalar product 
(m,v) as L;e5(„)n5(i.) M/V;- Predictions, which we write P{u), 
represent a vector where each component is the prediction 
corresponding to one item: predictions depend implicitly on 
the training set %. 

3.2 Baseline Schemes One of the most basic prediction 
algorithms is the Per User Average scheme given by 
the equation P{u) = u. That is, we predict that a user 
will rate everything according to that user's average rating. 
Another simple scheme is known as Bias From Mean (or 
sometimes NON Personalized ||8J). It is given by 

P{u); = U-\ — - V V; — V. 

cW(5,(x))„4) 

That is, the prediction is based on the user's average plus the 
average deviation from the user mean for the item in question 
across all users in the training set. We also compare to the 
item-based approach that is reported to work best fT4l, which 
uses the following adjusted cosine similarity measure, given 
two items / and j: 

Sim,' y = _ = 

The prediction is obtained as a weighted sum of these 
measures thus: 

p,y ^ Iig5H|sim,j|(a,,^'M^- + p,-j) 

L;G5(„) |sim;j| 

where the regression coefficients a, p, y are chosen so as to 
minimize Y^ueSi j(u){'^i,j^i^i,i ^ "')^ i and j fixed. 



3.3 The Pearson Reference Scheme Since we 

wish to demonstrate that our schemes are comparable in 
predictive power to memory-based schemes, we choose to 
implement one such scheme as representative of the class, 
acknowledging that there are many documented schemes of 
this type. Among the most popular and accurate memory- 
based schemes is the PEARSON scheme ifTSl . It takes the 
form of a weighted sum over all users in x 
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where y is a similarity measure computed from Pearson's 
correlation: 



Corr(u^w) 



{u — u,w — w) 



Following ||2l[8l, we set 

Y(m,w) = Corr{u,w) \Corr{u,w)\^^^ 

with p = 2.5, where p is the Case Amplification power. Case 
Amplification reduces noise in the data: if the correlation is 
high, say Corr = 0.9, then it remains high (0.9^^ ^ 0.8) after 
Case Amplification whereas if it is low, say Corr — 0.1, then 
it becomes negligible (0.1^'^ = 0.003). Pearson's correlation 
together with Case Amplification is shown to be a reasonably 
accurate memory-based scheme for CF in [2| though more 
accurate schemes exist. 

3.4 The Slope One Scheme The slope one schemes 
take into account both information from other users who 
rated the same item (like the ADJUSTED COSINE ITEM- 
Based) and from the other items rated by the same user 
(Uke the Per User Average). However, the schemes also 
rely on data points that fall neither in the user array nor in 
the item array (e.g. user A's rating of item / in Fig. [TJ, but 
are nevertheless important information for rating prediction. 
Much of the strength of the approach comes from data that 
is not factored in. Specifically, only those ratings by users 
who have rated some common item with the predictee user 
and only those ratings of items that the predictee user has 
also rated enter into the prediction of ratings under slope one 
schemes. 

Formally, given two evaluation arrays v, and w, with / = 
we search for the best predictor of the form f{x) ~ 
x + b to predict w from v by minimizing £,(vi + b — w,)^. 
Deriving with respect to b and setting the derivative to zero, 
we get b = . In other words, the constant b must be 

chosen to be the average difference between the two arrays. 
This result motivates the following scheme. 

Given a training set and any two items j and / with 
ratings uj and m,- respectively in some user evaluation u 



(annotated as u£Sj,i{x)), we consider the average deviation 
of item / with respect to item J as: 



dev;, ; 



card{Sj,i{x)) ' 



Note that any user evaluation u not containing both uj and 
Ui is not included in the summation. The symmetric matrix 
defined by devy , can be computed once and updated quickly 
when new data is entered. 

Given that dev^ , + m, is a prediction for Uj given m,-, 
a reasonable predictor might be the average of all such 
predictions 
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where Rj = e S{u),i ^ j,card{Sjj{x)) > 0} is the set 
of aU relevant items. There is an approximation that can 
simplify the calculation of this prediction. For a dense 
enough data set where almost all pairs of items have ratings, 
that is, where card{Sjj{x)) > for almost all ij, most 
of the time Rj = S{u) for j ^ S{u) and Rj — S{u) — {j} 
when e S{u). Since u = curJku)) " ^Jw]) 

for most j, we can simplify the prediction formula for the 
Slope One scheme to 



1 



card{Rj) 



It is interesting to note that our implementation of 
Slope One doesn't depend on how the user rated individual 
items, but only on the user's average rating and crucially on 
which items the user has rated. 

3.5 The Weighted Slope One Scheme One 

of the drawbacks of Slope One is that the number of 
ratings observed is not taken into consideration. Intuitively, 
to predict user A's rating of item L given user A's rating of 
items J and K, if 2000 users rated the pair of items J and 
L whereas only 20 users rated the pair of items K and L, 
then user A's rating of item J is likely to be a far better 
predictor for item L than user A's rating of item K is. Thus, 
we define the WEIGHTED Slope One prediction as the 
following weighted average 

pw5i(^) . ^ ^iesiu)-{j}{dsyjJ + Ui)cjj 

LiGS(M)-{j} Cj,i 

where c^.,- = card{Sjj{x)). 

3.6 The Bl-POLAR SLOPE ONE Scheme While 
weighting served to favor frequently occurring rating pat- 
terns over infrequent rating patterns, we will now consider 



favoring another kind of especially relevant rating pattern. 
We accomplish this by splitting the prediction into two parts. 
Using the WEIGHTED Slope One algorithm, we derive one 
prediction from items users liked and another prediction us- 
ing items that users disliked. 

Given a rating scale, say from to 10, it might seem 
reasonable to use the middle of the scale, 5, as the threshold 
and to say that items rated above 5 are liked and those rated 
below 5 are not. This would work well if a user's ratings are 
distributed evenly. However, more than 70% of all ratings 
in the EachMovie data are above the middle of the scale. 
Because we want to support all types of users including 
balanced, optimistic, pessimistic, and bimodal users, we 
apply the user's average as a threshold between the users 
liked and disliked items. For example, optimistic users, who 
like every item they rate, are assumed to dislike the items 
rated below their average rating. This threshold ensures that 
our algorithm has a reasonable number of liked and disliked 
items for each user 

Referring again to Fig.[T] as usual we base our prediction 
for item J by user B on deviation from item / of users (like 
user A) who rated both items / and J. The Bl-POLAR SLOPE 
One scheme restricts further than this the set of ratings 
that are predictive. First in terms of items, only deviations 
between two liked items or deviations between two disliked 
items are taken into account. Second in terms of users, only 
deviations from pairs of users who rated both item / and J 
and who share a like or dislike of item / are used to predict 
ratings for item J. 

The splitting of each user into user likes and user dis- 
likes effectively doubles the number of users. Observe, how- 
ever, that the bi-polar restrictions just outlined necessarily 
reduce the overall number of ratings in the calculation of 
the predictions. Although any improvement in accuracy in 
light of such reduction may seem counter-intuitive where 
data sparseness is a problem, failing to filter out ratings that 
are irrelevant may prove even more problematic. Crucially, 
the Bi-PoLAR Slope One scheme predicts nothing from 
the fact that user A likes item K and user B dislikes this same 
item K. 

Formally, we split each evaluation in u into two sets of 
rated items: S"''\u) = {/ e 5(m)|m; > u} and S'^'"'''"' (u) ^ 
{i G S{u)\ui < u}. And for each pair of items split the 
set of all evaluations % into S'/'j" — {uG x\i, j £ S'''''^{u)} and 
gdhiike = {j^ £ (= S''''i<ke'(^u)}. Using these two sets, we 
compute the following deviation matrix for liked items as 
well as the derivation matrix devff'^'^. 
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The prediction for rating of item j based on rating of item / is 



on whether / belongs to S''^''{u) or 5'''"'''*^(m) respectively. 
The Bi-POLAR Slope One scheme is given by 

I v „dislike ^dislike 

+ LieS'"'"'''(u)-{j}Pj,i 



LJike ttr ,. „dislike 



where the weights c'-'^f = card{S'jf) and cff''"^ = 
card{Sff are similar to the ones in the WEIGHTED 
Slope One scheme. 

4 Experimental Results 

The effectiveness of a given CF algorithm can be measured 
precisely. To do so, we have used the All But One Mean 
Average Error (MAE) |2|. In computing MAE, we succes- 
sively hide ratings one at a time from all evaluations in the 
test set while predicting the hidden rating, computing the av- 
erage error we make in the prediction. Given a predictor P 
and an evaluation u from a user, the error rate of P over a set 
of evaluations is given by 



MAE 
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where u^'^ is user evaluation u with that user's rating of the 
/th item, m,, hidden. 

We test our schemes over the EachMovie data set made 
available by Compaq Research and over the Movielens data 
set from the Grouplens Research Group at the University of 
Minnesota. The data is collected from movie rating web sites 
where ratings range from 0.0 to 1.0 in increments of 0.2 for 
EachMovie and from 1 to 5 in increments of 1 for Movielens. 
Following imiTT], we used enough evaluations to have a total 
of 50,000 ratings as a training set {%) and an additional set of 
evaluations with a total of at least 100,000 ratings as the test 
set (x')- When predictions fall outside the range of allowed 
ratings for the given data set, they are corrected accordingly: 
a prediction of 1 .2 on a scale from to 1 for EachMovie is 
interpreted as a prediction of 1 . Since Movielens has a rating 
scale 4 times larger than EachMovie, MAEs from Movielens 
were divided by 4 to make the results directly comparable. 

The results for the various schemes using the same 
error measure and over the same data set are summarized 
in Table [T] Various subresults are highlighted in the Figures 
that follow. 

Consider the results of testing various baseline schemes. 
As expected, we found that Bias From Mean performed 
the best of the three reference baseline schemes described in 
Interestingly, however, the basic Slope One 
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scheme described in section [34| had a higher accuracy than 
Bias From Mean. 

The augmentations to the basic Slope One described 
in sections 3.5 and 3.6 do improve accuracy over Each- 



M, depending Movie. There is a small difference between Slope One and 



Scheme 


EachMovie 


Movielens 


Bi-POLAR Slope One 


0.194 


0.188 


Weighted Slope One 


0.198 


0.188 


Slope One 


0.200 


0.188 


Bias From Mean 


0.203 


0.191 


Adjusted Cosine Item-Based 


0.209 


0.198 


Per User Average 


0.231 


0.208 


Pearson 


0.194 


0.190 



Table 1: All Schemes Compared: AH But One Mean Aver- 
age Error Rates for the EachMovie and Movielens data sets, 
lower is better. 

Weighted Slope One (about 1%). Splitting dishke and 
hke ratings improves the results 1.5-2%. 

Finally, compare the memory-based Pearson scheme 
on the one hand and the three slope one schemes on the other. 
The slope one schemes achieve an accuracy comparable to 
that of the Pearson scheme. This result is sufficient to 
support our claim that slope one schemes are reasonably 
accurate despite their simphcity and their other desirable 
characteristics. 

5 Conclusion 

This paper shows that an easy to implement CF model based 
on average rating differential can compete against more 
expensive memory-based schemes. In contrast to currently 
used schemes, we are able to meet 5 adversarial goals with 
our approach. Slope One schemes are easy to implement, 
dynamically updateable, efficient at query time, and expect 
httle from first visitors while having a comparable accuracy 
(e.g. 1.90 vs. 1.88 MAE for MovieLens) to other commonly 
reported schemes. This is remarkable given the relative 
complexity of the memory-based scheme under comparison. 
A further innovation of our approach are that spHtting ratings 
into dishke and hke subsets can be an effective technique 
for improving accuracy. It is hoped that the generic slope 
one predictors presented here will prove useful to the CF 
community as a reference scheme. 

Note that as of November 2004, the 
Weighted Slope One is the collaborative filtering 
algorithm used by the Bell/MSN Web site inDiscover.net. 
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